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ABSTRACT 

In Plasmodium falciparum, perinuclear subtelomeric 
chromatin conveys monoallelic expression of viru- 
lence genes. However, proteins that directly bind to 
chromosome ends are poorly described. Here we 
identify a novel DNA/RNA-binding protein family 
that bears homology to the archaeal protein Alba 
(Acetylation lowers binding affinity). We isolated 
three of the four PfAlba paralogs as part of a 
molecular complex that is associated with the 
P. falciparum-spec\1\c TARE6 (Telomere-Associated 
Repetitive Elements 6) subtelomeric region and 
showed in electromobility shift assays (EMSAs) 
that the PfAlbas bind to TARE6 repeats. In early 
blood stages, the PfAlba proteins were enriched at 
the nuclear periphery and partially co-localized with 
PfSir2, a TARE6-associated histone deacetylase 
linked to the process of antigenic variation. The 
nuclear location changed at the onset of parasite 
proliferation (trophozoite-schizont), where the 
PfAlba proteins were also detectable in the cyto- 
plasm in a punctate pattern. Using single-stranded 
RNA (ssRNA) probes in EMSAs, we found that 
PfAlbas bind to ssRNA, albeit with different 
binding preferences. We demonstrate for the first 
time in eukaryotes that Alba-like proteins bind to 
both DNA and RNA and that their intracellular 
location is developmentally regulated. Discovery of 
the PfAlbas may provide a link between the previ- 
ously described subtelomeric non-coding RNA and 
the regulation of antigenic variation. 



INTRODUCTION 

The apicomplexan parasite Plasmodium falciparum, the 
causative agent of the most lethal form of human 
malaria, undergoes a complex life cycle with distinct de- 
velopmental stages both in the Anopheles mosquito and in 
the human hosts (1). To differentiate and adapt to an 
ever-changing environment the parasite applies different 
levels of regulation to modulate gene expression through- 
out its life cycle [reviewed in (2)]. Comparative genomic 
studies have revealed an apparent paucity of transcription 
factors (TFs) in apicomplexan proteomes although the 
basal set of TFs associated with RNA polymerase II are 
conserved (3-5). Nevertheless, the recent identification of 
the TF PfMybl in P. falciparum (6) and the DNA-binding 
protein family Apicomplexan Apetala2 (ApiAP2) (7-10), 
strongly suggests that apicomplexan parasites possess a 
larger repertoire of elements regulating gene expression 
than previously thought. Detailed transcriptome analyses 
have also revealed that the plasmodial intra-erythrocytic 
developmental cycle is accompanied by a continuous 
cascade of gene transcription (11,12). However, increasing 
evidence supports the idea that various degrees of 
post-transcriptional control exist in many, if not all, 
growth stages (13,14). One such mechanism, translational 
repression, post-transcriptionally regulates a subset of 
mRNAs during gametocyte-to-ookinete transition in 
P. berghei (15), and proteins that are involved in this 
process are also conserved in P. falciparum. 

Epigenetic regulation in P. falciparum was first 
demonstrated for multicopy genes involved in antigenic 
variation implicating histone marks (acetylation) and the 
histone deacetylase PfSir2 in variegated gene expression at 
chromosome ends (16,17). Subsequent genome-wide 
analyses using ChlP-on-chip revealed a general role for 
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several histone marks (methylation and acetylation) in 
P. falciparum gene regulation (18,19). Moreover, 
non-coding highly repetitive subtelomeric DNA elements 
called TAREs (Telomere-Associated Repetitive Elements) 
present virtually at all P. falciparum chromosome ends 
play a central role in virulence gene regulation. The 
TAREs recruit to the nuclear periphery several epigenetic 
factors that are involved in the silencing of major virulence 
gene families (17,19-21). We termed the TARE-associated 
protein complex Perinuclear Epigenetic Repression Center 
(PERC) (19). TARE6, which is the largest repetitive 
region, is composed of 21 -bp repetitive units stretching 
over 6 to 23 kb on different chromosome ends. TARE6 
plays a central role in the clustering of telomeres at the 
nuclear periphery (22,23). Specific proteins that directly 
bind to TARE6 DNA remain elusive. 

In this work, we purified the TARE6-associated protein 
complex and identified a new DNA/RNA-binding protein 
family in P. falciparum composed of four paralogs. 
All members contain a domain presenting strong 
homology to the archaeal chromatin protein family 
Alba (Acetylation lowers binding affinity; InterPro 
IPR002775). We show that the P. falciparum Alba 
proteins (PfAlbal-4) are able to directly bind to TARE6 
DNA repeats and to single-stranded RNA (ssRNA) with 
different sequence specificities. These proteins are highly 
enriched at the nuclear periphery in ring stages and 
expand to the cytoplasm in more mature stages where 
they form speckles. Our results demonstrate for the first 
time in a eukaryotic system that Alba-like proteins bind to 
both DNA and RNA suggesting a dual role in chromatin 
biology and RNA regulation. 

MATERIAL AND METHODS 

Parasite culture 

Plasmodium falciparum blood stage parasites were 
cultivated as previously described (24). 

Nuclear and cytoplasmic extracts 

Nuclear and cytoplasmic extracts were prepared as previ- 
ously described (17) with some modifications. A total of 
5 x 10 9 parasites were isolated from infected erythrocytes 
by saponine lysis, resuspended in 1 ml of lysis buffer 
(lOmM HEPES pH 7.9, lOmM KC1, 0.1 mM EDTA, 
0.1 mM EGTA, 1 mM DTT, 0.65% NP-40) supplemented 
with protease inhibitors (Complete, Roche) and incubated 
for 30min at 4°C. Total parasite lysis was achieved by 200 
strokes in a prechilled Dounce homogenizer. The lysate 
was centrifuged for lOmin at 14000rpm at 4°C. The 
supernatant representing the cytoplasmic fraction was re- 
covered, aliquoted and stored at — 80°C. The nuclei pellet 
was washed three times with phosphate-buffered saline 
(PBS) and then resuspended in 100 ul of extraction 
buffer (20 mM HEPES pH 7.9, 400 mM NaCl, 1 mM 
EDTA, ImM EGTA, 1 mM DTT) supplemented with 
protease inhibitors and incubated with vigorous shaking 
for 30min at 4°C. The preparation was then centrifuged 
for lOmin at 14000rpm at 4°C. The supernatant repre- 
senting the nuclear fraction was recovered, aliquoted and 



stored at — 80°C. The purity of the extracts was checked 
by western blotting, probing the membrane with anti- 
HSP70 and anti-H3 me 3K9 antibodies. 

Identification of TARE6-associated proteins 

Electromobility shift assays (EMSAs) using a radiolabeled 
TARE6 DNA probe (Supplementary Table SI) and ring 
stage nuclear extracts were performed as previously 
described (17). The DNA-protein complex was analyzed 
on a non-denaturing polyacrylamide gel, the complex was 
cut out of the gel and proteins were recovered by 
electro-elution in running buffer (0.025 M Tris, 0.192 M 
glycine, 1% SDS) using an Electro-Eluter (Model 422; 
Bio-Rad) at 10 mA for 5h. After elution, the proteins 
were dialyzed, lyophilized and resuspended in PBS. The 
proteins were quantified by Lowry's method, resolved by 
SDS-PAGE and visualized by silver staining using the 
Silver-Quest Staining Kit (Invitrogen). The identity of 
the proteins was established by mass spectrometry 

Biotinylated TARE6 DNA was immobilized onto 
Streptavidin-coupled Dynabeads following an incubation 
in binding buffer (20 mM HEPES pH 7.9, 100 mM KC1, 
2mM MgCl 2 , 0.5 mM EDTA, ImM DTT, 0.4 mM 
ZnS0 4 , 40 mM ZnCl 2 , 10% Glycerol and 0.1% Triton 
X-100) for 1 h at 4°C. The biotinylated 165-bp KAHRP 
DNA probe was obtained by PCR amplification with 
primers KAHRPups and KAHRP2 (Supplementary 
Table SI) and immobilized onto Streptavidin-coupled 
Dynabeads as described above. DNA-coated beads were 
collected, washed twice in binding buffer and incubated 
with nuclear extracts for 30min at 25°C. Protein-bead 
complexes were washed four times with binding buffer 
containing 250 mM NaCl and proteins were eluted with 
Laemmli sample buffer without any dye by boiling at 95°C 
for 5min. Then, the proteins were concentrated and 
desalted using Amicon Centricon columns and analyzed 
by SDS-PAGE followed by silver staining. Finally, the 
proteins associated with TARE6 and KAHRP DNA 
were identified by mass spectrometry. 

Mass spectrometry 

The in-gel digestion of proteins was carried out according 
to the manufacturer's manual (Pierce, Rockford, IL). The 
desired band was excised, destained and possible disulfide 
bonds reduced with tris (2-carboxyethyl) phosphine 
(TCEP) and alkylated with iodoacetamide. Then, the gel 
pieces were dehydrated with acetonitrile and rehydrated 
with lOOng trypsin (Promega, Madison, WI) in 25 mM 
ammonium bicarbonate solution and were incubated at 
37°C overnight. The tryptic fragments were extracted 
from the gel by adding 1% trifluroacetic acid. Next, the 
peptides were purified with a CI 8 reversed-phase 
minicolumn filled into a micropipette tip, i.e. ZipTip 
CI 8 (Millipore, Bedford, MA). Purified peptides were 
cocrystallized with a-cyano-4-hydroxy cinnamic acid 
matrix (Applied Biosystems, Foster City, CA, USA) on 
a matrix-assisted laser desorption ionization (MALDI) 
target plate. Both mass spectrometry (MS) and MS/MS 
spectra were acquired in a MALDI-time of flight 
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(MALDI-TOF/TOF) Mass Spectrometer (Applied 
Biosystems 4800 Proteomics Analyzer) in the reflector 
mode. Calibration was updated before each acquisition 
using a standard peptide mixture according to the instru- 
ment's protocol. Protein identification was performed 
using the GPS Explorer software (Applied Biosystems) 
with the MASCOT search engine (Matrix Science). The 
search was performed against the NCBInr 20080502 
database (6493741 sequences; 2216039219 residues) and 
the Mascot significance threshold was set at P = 0.05. 
Individual Ions scores >33 indicated identity or extensive 
homology and were considered to be significant. 

Antibodies (Abs) 

Rabbit polyclonal Abs against PfAlbal-4 resulting from 
immunizations of rabbits with PfAlba-specific synthetic 
peptides were purchased from GenScript Corporation 
(genscript.com). The peptide sequences are listed in 
Supplementary Table S2. Mouse polyclonal Abs against 
PfAlbal-4 were isolated from the sera of six mice 
immunized with recombinant GST-tagged PfAlba 
proteins. Briefly, each recombinant protein was emulsified 
with complete adjuvant (Sigma) and 100 ug was 
inoculated into a single mouse. Following the inoculation 
series, animals were sacrificed and serum was collected. 
Immunoglobulins were purified from the serum by 
protein A-sepharose chromatography following standard 
procedures (Pharmacia). 

Recombinant proteins 

The coding region of PfAlbal-4 and the two tandem AP2 
domains of PfSip2 (PFF0200c; residues 177-312) were 
PCR-amplified from cDNA prepared from 3D7 parasites 
and cloned into the pGEX-3X vector (GE Healthcare), 
downstream of the gene encoding GST. Penta-His 
tagged PfAlba3 was obtained by cloning the PF10_0063 
gene into the pET28a vector (Novagen). The primer 
pairs used are displayed in Supplementary Table S3. 
Recombinant protein expression was carried out 
in BL21-CodonPlus(DE3)-RIL Escherichia coll cells 
(Stratagene). The recombinant proteins were induced 
with 0.1 mM IPTG at an OD 600 of 0.6 for 3h at 37°C. 
GST-fusion proteins were purified using Glutathione 
Sepharose 4B beads (GE Healthcare) and His-tagged 
proteins were purified using a Ni-NTA Superflow 
gravity column (Qiagen) according to manufacturer in- 
structions. The resulting eluates were resolved using 
SDS-PAGE and protein purity was determined by 
Coomassie blue staining. 

DNA and RNA EMS As 

The 79-bp biotinylated TARE6 probe was obtained as 
previously described (21). The 79-bp biotinylated AP2 
probe containing two GTGCA motifs was obtained by 
PCR amplification of the 5' regulatory region of the 
MAL7P1.119 gene from genomic DNA isolated from 
3D7 parasites. The primers used are displayed in 
Supplementary Table S3. Fragments of dsDNA were 
obtained by hybridization of the sense oligonucleo- 
tide with its complementary anti-sense sequence. 



All other unlabeled and biotin-labeled oligonucleotides 
(Supplementary Table SI) were purchased from 
Eurogentec. EMSAs were performed as previously 
described (17). 

ssRNA oligonucleotides (Supplementary Table SI) 
were labeled at the 5'-end with [y 32 -P]-ATP using the 
Ready-To-Go T4 Polynucleotide Kinase kit (Amersham 
Biosciences). Radiolabeled oligonucleotides were 
separated from free [y 32 -P]-ATP using Sephadex G-25 
columns (Roche). RNA EMSAs were performed as previ- 
ously described (25). 

Immunofluorescence microscopy 

Synchronized cultures of the 3D7 P. falciparum parasite 
line were washed in PBS, lysed in saponine (0.015%) and 
fixed in suspension with 4% paraformaldehyde (Electron 
Microscopy Sciences) for 15min at room temperature 
(RT). Fixed parasites were blocked for 30min with 
PBS+1% bovine serum albumin (BSA) and incubated 
with the primary Abs diluted in PBS+1% BSA for 
45min at RT. After washing with PBS, parasites were 
incubated with the secondary Abs diluted in 
PBS+1%BSA for 30min at RT. Finally, labeled parasites 
were deposited on microscope slides and mounted in 
Vectashield anti-fading solution supplemented with 
DAPI (Vector Laboratories). Images were captured 
using a Nikon Eclipse 80i optical microscope and 
analyzed with the NIS-Elements BR software (Nikon). 
For confocal microscopy, images were captured using a 
Nikon Eclipse TE2000-E confocal microscope and 
analyzed with the EZ-C1 software (Nikon). 

Western blotting 

Parasite cytoplasmic and nuclear extracts (equivalent to 
5 x 10 7 parasites per lane) were resolved on a 4-12% 
SDS-PAGE gel, transferred onto a nitrocellulose 
membrane and subject to western blotting with specific 
Abs against the PfAlbas (Supplementary Table S2), 
anti-H3 Me 3K9 (Abeam) and anti-HSP70 (26). After incu- 
bation with secondary antibodies (horseradish- 
conjugated), membranes were developed with Super- 
Signal West Pico Chemiluminescent Substrate (Pierce). 

Electron microscopy 

Ultrathin sections of P. falciparum blood stage parasites 
were immuno-stained for electron microscopy (EM) 
analysis. Plasmodium falciparum-'mfected erythrocytes 
were fixed in 1% glutaraldehyde in RPMI-HEPES 
buffer for 1 h at 4°C. After washing, polymerization in 
Agar type IX and dehydratation with ethanol, the 
samples was transferred in LR-White (London Resin 
Compagny Ltd, Berkshire, UK) and polymerized for 
12 h at 4°C. Ultrathin sections were collected and 
mounted on Cu/Pd grids. Sections were blocked in PBS 
containing 5% (wt/vol) nonfat dried milk and 0.01% 
(wt/vol) Tween-80 followed by a washing with PBS con- 
taining 0.8% BSA (fraction V) and 0.01% Tween-80. The 
washed grids were incubated for 2.5 h with anti-PfAlba 
Abs diluted in the above-mentioned solution. Samples 
were washed and incubated for 25min with 10 nm 
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protein A-colloidal gold (Cell Microscopy Center, 
University Medical Center Utrech, The Netherlands). 
Washed sections were stained for 15min with aqueous 
4% uranyl acetate followed by a staining of 2min with 
1% lead citrate. For double labeling, the first labeling was 
performed with anti-Histone 3 Abs (Abeam abl791) 
diluted at 1/100 and revealed with 15nm protein 
A-colloidal gold. The grids were washed and fixed with 
1% glutaraldehyde in PBS for 5min and blocked. After 
washing, the second labeling was performed with 
anti-PfAlba Abs diluted at 1/50 and revealed with lOnm 
protein A-colloidal gold according to the same protocol. 
The grids were washed and stained with aqueous 4% 
uranyl acetate followed by a staining of 2min with 1% 
lead citrate. Sections were analyzed with a JEOL 
JEM-1200EX electron microscope. 

PfAlba-Tyl transfectant parasites 

The PfAlbal-TylC and PfAlba4-TylC constructs were 
obtained by replacing the pfenr-GFP sequence contained 
in the pLN-ENR-GFP plasmid (27) with the coding 
regions of PF08_0074 and MAL13P1.237 respectively. 
In addition, a nucleotide sequence encoding for two 
repeats of the Tyl epitope (LEVHTNQDPLD) was 
inserted downstream of the PfAlba genes. Episomal trans- 
fection of 3D7 parasites was performed as previously 
described (28). PfAlba-Tyl expression was detected 
using a rabbit anti-Tyl Ab (Genscript) or the mouse 
BB2 monoclonal anti-Tyl Ab (29). 

Analytical ultracentrifugation 

His-tagged PfAlba3 samples in PBS (8 uM) were 
centrifuged at a speed of 42 000rpm in a Beckman 
Coulter XL-I analytical ultracentrifuge at 20° C using a 
AN60-Ti rotor equipped with 12 mm double-sector epon 
centrepieces. Detection of the protein concentration as a 
function of radial position and time was performed by 
optical density measurements at a wavelength of 220 nm. 
The following parameters were calculated using Sednterp 
1.09 and used for the analysis of the experiment: partial 
specific volume u= 0.735 ml/g, viscosity r\ = 1.052 cP and 
density p = 1.019 g/ml. Sedimentation velocity data 
analysis was performed by continuous size distribution 
analysis c(s) using the program Sedfit 12.0 (30) (available 
at http://www.analyticalultracentrifugation.com). All the 
c(s) distributions were calculated with a fitted frictional 
ratio fjfo and a maximum entropy regularization proced- 
ure with a confidence level of 0.95. 

RESULTS 

The TARE6-associated complex contains proteins that are 
homologous to the archaeal chromatin protein family Alba 

Plasmodium falciparum chromosome ends are composed 
of degenerate G-rich heptameric repeats, the telomeres, 
followed by a mosaic of six non-coding subtelomeric 
regions, TARE1-6 (Figure 1A) (31,32). To define the mo- 
lecular nature of the protein complex associated with 
TARE6 (17), we used two different experimental 



approaches (EMSA and oligonucleotide pull down), 
each using a labeled TARE6 probe containing three re- 
petitive units (Supplementary Table SI). In the first 
approach (Figure IB), the protein complex that bound 
to radiolabeled TARE6 in EMSAs was isolated from 
ring stage parasite nuclear extracts using gel electrophor- 
esis and subjected to mass spectrometry. In the second 
approach (Figure IB), biotinylated TARE6 was 
immobilized onto streptavidin-coated beads and proteins 
from ring stage parasite nuclear extracts that bound to 
TARE6 were identified by mass spectrometry. A similar 
pull down experiment was performed using a probe from 
the promoter region of the Knob-Associated Histidine- 
Rich (KAHRP) gene (PFBOlOOc) as a non-TARE DNA 
control. 

LC-MS/MS analysis revealed that TARE6 interacts 
with a large molecular complex of over 30 proteins 
(Supplementary Table S4); only five of these proteins, 
including a few histones, were identified in the pull down 
experiment performed using the KAHRP probe 
(Supplementary Table S4). Intriguingly, three members 
of the TARE6-associated complex PF08_0074, 
MAL13P1.233 and MAL13P1.237 (plasmodb.org) 
(Figure 1C) presented homology to the archaeal DNA/ 
RNA-binding protein family Alba, that plays an import- 
ant role in chromatin organization in archaea (33). 
Further sequence analysis of the P. falciparum genome 
revealed the existence of a fourth paralog, PF10_0063. 
We name the four members of the newly described 
Alba-like protein family in P. falciparum as PfAlba 1, 
PfAlba2, PfAlba4 and PfAlba3, respectively. Assessment 
of protein architecture using the Pfam (www.ebi.ac.uk/ 
pfam) and InterPro (www.ebi.ac.uk/interpro) databases 
(Figure ID) showed that PfAlba 1 (27kDa) and PfAlba2 
(25kDa), in addition to their N-terminal Alba-like 
domains (residues 8-123 and residues 9-87 respectively), 
contain multiple C-terminal RGG-box (arginine- and 
glycine-rich) RNA-binding domains. PfAlba3 (12KDa), 
the shortest member of the PfAlba protein family is essen- 
tially composed of one module of the Alba-like domain 
(residues 10-102). Finally, PfAlba4, a 42kDa protein, is 
composed of two distinct domains: an N-terminal 
Alba-like domain (residues 19-115) and a central 
membrane-tethering ENTH/VHS domain (residues 
175-229) (InterPro ID: IPR008942). Multiple sequence 
alignment analysis showed that the PfAlbas possess 
conserved orthologs in other Plasmodial species 
(Supplementary Figure SI). 

To conclusively establish the presence of the PfAlbas in 
the TARE6-associated protein complex, we performed 
EMSAs with the radiolabeled TARE6 probe, nuclear 
extracts of ring stage parasites and purified rabbit 
antibodies raised against PfAlba 1, PfAlba2, PfAlba3 or 
PfAlba4 (Supplementary Table S2). The anti-PfAlba 1, 
-PfAlba2 and -PfAlba4 antibodies 'super-shifted' the 
TARE6-associated protein complex in a dose-dependent 
manner (Figure IE; complex 'SC') whereas no super-shift 
was observed using antibodies directed against PfAlba3 
(data not shown). Pre-immune sera did not lead to 
super-shifts. Taken together, we show that the 
TARE6-associated molecular complex contains three 
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Figure 1. TARE6 associates with a large molecular complex containing 
paralogs of the PfAlba protein family. (A) Schematic representation of 
the subtelomeric regions of P. falciparum chromosomes. The non- 
coding TAREs are adjacent to the subtelomeric var genes. 



paralogs of a previously uncharacterized protein family in 
P. falciparum, PfAlbal, PfAlba2 and PfAlba4, bearing 
homology to the archaeal chromatin protein family Alba. 

PfAlbas bind to DNA with relaxed sequence specificity 

In archaea, Alba proteins bind to DNA and are proposed 
to play an important role in chromatin architecture (33). 
To elucidate whether the PfAlbas are capable of binding 
to DNA and directly interacting with TARE6, we per- 
formed EMSAs using recombinant GST-tagged PfAlbas 
(Supplementary Figure S2) and a biotinylated version of 
the double-stranded TARE6 DNA probe described above. 
Parasite nuclear extract served as a positive control for 
TARE6 binding. As shown in Figure 2A, all four 
PfAlbas were able to bind to the TARE6 probe in a 
dose-dependent manner. In particular, with increasing 
concentrations of PfAlba3, the DNA-protein complex 
migrated at a slower rate suggesting that this PfAlba, 
which contains just an Alba domain, could form higher 
order complexes similar to its archaeal counterparts (34). 
Our data demonstrate for the first time in a eukaryotic 
system that Alba-like proteins can directly bind to DNA. 

In archaea, Albas have been reported to bind to DNA 
without sequence specificity (35). First focusing on 
PfAlba4, we wanted to assess if the same holds true for 
the P. falciparum Albas. Therefore, we designed 
a biotinylated DNA probe containing two tandem GTG 
CA motifs that represent the recognition sequence 
of PfSIP2, a recently described P. falciparum DNA- 
binding protein of the ApiAP2 family (8,9). This probe 
is of similar length to the TARE6 probe (79 bp) and is 
referred to as AP2 throughout this study 
(Supplementary Table SI). As expected, recombinant 
GST-tagged PfSIP2 bound to the AP2 probe and inversely 
did not bind to the TARE6 probe, which lacks GTGCA 
motifs (left panel of Figure 2B). Next, incubation of 
PfAlba4 with the AP2 probe did not lead to any detectable 
band shift, although under the same conditions, PfAlba4 
bound to the TARE6 probe (left panel of Figure 2B). 
These data suggest that, in contrast to the archaeal 
Albas, PfAlba4 may bind to DNA with an apparent 
sequence preference. To test this further, we assessed if 



Figure 1. Continued 

(B) Flowchart representation of the two approaches used to isolate the 
protein complex that associates with TARE6 in vitro. (C) Out of over 
30 protein candidates identified by mass spectrometry (Supplementary 
Table S4), three proteins showed similarity to the archaeal DNA/ 
RNA-binding protein family Alba. Plasmodb accession numbers and 
the number of unique peptides identified are tabulated. (D) Pfam and 
InterPro databases were used to determine the architecture of PfAlba 
proteins. The Alba-like domain is depicted in black, the ENTV/VHS 
domain in light gray and the RGG-box RNA-binding domain in dark 
gray. The scale bar represents 50 amino acids. (E) To assess the presence 
of PfAlbal, PfAlba2 and Pfalba4 in the molecular complex associated 
with TARE6, a 32 P-labeled TARE6 probe was incubated with P. falcip- 
arum ring stage nuclear extract 'NE' and subject to EMSAs. A single 
DNA-protein complex 'C formed when the TARE6 probe was 
incubated with nuclear extract alone. Addition of increasing amounts 
of specific antibodies (u.g of Immunoglobulin G indicated as uggG) 
against PfAlba4 (Ab 237-1), PfAlbal (Ab 74-2) or PfAlba2 (Ab 233) 
led to the formation of a super-shift complex 'SC. Incubation with the 
respective pre-immune sera 'PF did not result in a super-shift. 
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Figure 2. Pf Albas bind to DNA with relaxed sequence specificity. 
(A) In order to assess the DNA-binding properties of PfAlbas, a 
biotin-labeled TARE6 probe was incubated with P. falciparum 
nuclear extract 'NE' or increasing concentrations [ranging from 0.5 to 
5uM] of recombinant PfAlbal-4 and subjected to EMSAs. (B) Left 
panel: To assess PfAlba4 specificity, biotin-labeled TARE6 and AP2 
probes were each incubated with either 5 uM GST or 0. 1 uM PfSIP2 or 
0.5 uM PfAlba4; Right panel: Competition EMSAs were also used to 
determine the binding specificity of PfAlba4 to DNA. 0.5 uM recom- 
binant PfAlba4 was incubated with biotin-labeled TARE6 (0.28 nM) in 
the absence (No comp.) or presence of 50-fold excess of the indicated 
unlabeled dsDNA competitor. (C) Quantification of competition 
EMSAs performed using PfAlbal, PfAlba2 and PfAlba4, and the 
indicated competitor probes. The percentage inhibition was calculated 
by normalizing each datapoint to TARE6-PfAlba binding in the 
absence of competitor. Error bars represent the standard deviation of 
data obtained from three independent experiments. (D) Fragments of 
TARE6 DNA ranging from 7 to 42 bp in length (left panel) were used 
in competition EMSAs to determine the minimal length required by 
PfAlba4 for DNA binding. Data were quantified as described in the 
legend of Figure 2C. Error bars represent the standard deviation of 
data obtained from three independent experiments. 



a number of non-TARE6 DNA sequences would be able 
to compete for binding of PfAlba4 to TARE6 in EMSAs. 
In addition to an unlabeled AP2 probe, we designed two 
probes from the coding regions of PfHSP70 (PF08_0054) 
and PfActin-II (PF14_0124) and one probe containing a 
scrambled nucleotide sequence from the first 42 bp of the 
TARE6 probe (Scr Fl; Supplementary Table SI). All 
competition experiments used a 50-fold excess of the un- 
labeled 'competitor' probe. Surprisingly, the ScrFl, 
HSP70 and Actin-II probes were almost as potent in 
inhibiting TARE6-PfAlba4 binding as the unlabeled 
TARE6 probe (97% inhibition) (right panel of 
Figure 2B and C). In contrast, the unlabeled AP2 probe 
was less efficient at inhibition (54%), supporting the poor 
AP2-PfAlba4 binding observed in Figure 2B (left panel). 
Taken together, our results demonstrate that recombinant 
PfAlba4 binds to DNA with relaxed sequence preference. 
To determine if other PfAlbas exhibit similar sequence 
requirements, we performed TARE6-PfAlbal or 
-PfAlba2-binding EMSAs with the competitor probes 
described above. We observed that the 11011-TARE6 
probes were less efficient at inhibiting TARE6-PfAlbal/ 
2 binding (Figure 2C) as compared to TARE6-PfAlba4 
binding. Interestingly, a 50-fold excess of the unlabeled 
TARE6 probe inhibited TARE6-PfAlbal binding up to 
just 70%. This suggests that the PfAlbas present differen- 
tial sequence preferences in vitro with PfAlbal possessing 
the highest affinity for TARE6. 

Finally, to define the minimum length required by 
PfAlba4 for DNA binding, we performed competition 
EMSAs using TARE6 fragments of varying sizes 
(42-7 bp) (Figure 2D). Two repetitive units, i.e. the 42 bp 
probe, were able to fully inhibit TARE6-PfAlba4 binding 
whereas the percent inhibition by shorter TARE6 frag- 
ments gradually decreased, with a sharp drop at 14 bp. 
Our results define a minimal DNA length for efficient 
PfAlba4 recruitment to be >14bp. This could also 
explain our inability to identify specific DNA-binding 
motifs for PfAlbal and PfAlba4 in protein-binding micro- 
arrays that used 10 bp oligomers (M. Llinas and 
T. Campbell, personal communication). 

PfAlbas localize to perinuclear foci in ring stages and 
expand to the cytoplasm in mature forms 

Our in vitro data identified the PfAlbas as novel 
DNA-binding proteins with potential multiple targets in 
the nucleus. Next, we investigated the cellular localization 
of these proteins during P. falciparum blood stages. We 
performed immunofluorescence assays on synchronized 
ring, trophozoite and schizont parasite cultures using 
either affinity-purified antibodies raised in rabbits 
against synthetic peptides of PfAlbal -4 or polyclonal 
antibodies obtained from mice immunized with recombin- 
ant GST-tagged PfAlbal -4 (Supplementary Table S2). 
Antibodies against PfAlbal, PfAlba2 and PfAlba4 
stained the nuclear periphery in a spotted pattern similar 
to that observed for telomere clusters (Figure 3A and 
Supplementary Figure S3) (21); the antibodies against 
PfAlba3 did not give a signal under the conditions 
tested. Moreover, co-localization assays with antibodies 
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Figure 3. PfAlbas localize to perinuclear foci in ring stages and expand to the cytoplasm in mature forms. (A) Immunofluorescence analysis of 
PfAlbal, PfAlba2 and Pfalba4 distribution throughout the parasite life cycle (Rings, Trophozoites, Schizonts). The anti-PfAlba antibodies (Ab 74-2 
for PfAlbal; Ab 233 for PfAlba2; Ab 237-2 for PfAlba4) were used at 5.5, 4.5 and 8 ug/ml, respectively. Anti-rabbit Alexa-488 secondary antibodies 
(Molecular Probes) were used at a dilution of 1:500. Scale bars represent 1 um. (B) Immunofluorescence analysis of 3D7 Rings, Trophozoites and 
Schizonts stained with anti-PfAlba4 (red) and anti-PfSir2 (green) antibodies. Scale bar represents 1 um. (C) Western blot analysis of nuclear and 
cytoplasmic fractions obtained from synchronous cultures of Ring, Trophozoite and Schizont parasites. The equivalent of 5 x 10 7 parasites was 
loaded in each lane and probed with the indicated anti-PfAlba antibody (Ab 74-2 for PfAlbal; Ab 233 for PfAlba2; Ab 237-1 for PfAlba4). 
Anti-H3 M3K9 (nuclear marker) and anti-HSP70 antibodies were used to determine fractionation efficiency. (D) Immuno-EM was performed on ring 
stage parasites using anti-PfAlba4 antibodies (Ab 237-2) bound to 10 nm gold particles and anti-H3 antibodies bound to 15nM gold particles. The 
inset shows a single developing merozoite within schizont-stage parasites labeled with anti-PfAlba4 antibodies (Ab 237-2) bound to 10 nm gold 
particles; n = nucleus, c = cytoplasm, r = rhoptry; Scale bars represent 500 nm. 



against PfSir2 and PfAlba4 indicated that the nuclear 
distribution of PfAlba4 partially overlaps with an epigen- 
etic factor belonging to the TARE6-associated PERC 
(Figure 3B). In contrast, during the trophozoite and 
schizont stages, a strong signal was detected for 
PfAlbal, PfAlba2 and PfAlba4 in the cytoplasm, primar- 
ily in a punctate manner (Figure 3A and Supplementary 
Figure S3A). This distribution is reminiscent of m-RNA 
containing P-granules observed in P. berghei gametocytes 
(36). Also, co-localization assays with anti-PfSir2 and - 
PfAlba4 antibodies showed that the PfAlba4 signal dis- 
sociates from the PfSir2 signal in mature stages 
(Figure 3B), especially when analyzed by confocal micros- 
copy (Supplementary Figure S3B). 

Western blot analysis of the cytoplasmic and nuclear 
fractions reflected the dynamic changes in the subcellular 
compartmentalization of PfAlbal, PfAlba2 and PfAlba4 
during the parasite life cycle (Figure 3C). These PfAlbas 
were detected exclusively in the nuclear fraction in the ring 



stage whereas in mature stages, cytoplasmic localization 
became more dominant. Protein relocation can often be 
imputed to either proteolytic processing or sumoylation 
events. Because the PfAlbas migrate at their predicted 
size during gel electrophoresis and western blotting, we 
suppose that other modifications that could modulate 
the function of the PfAlbas without drastically changing 
the protein size may control their subcellular localization. 
Again, the anti-PfAlba3 antibodies did not detect a band 
at the predicted size in western blotting although they 
cross-reacted with recombinant GST-PfAlba3 under the 
same conditions (data not shown). 

In order to further define the exact localization of the 
PfAlbas within the nucleus, we immunolabeled ultrathin 
sections of blood stage parasites using antibodies directed 
against the PfAlbas and performed EM. We present here 
images obtained for PfAlba4 (Figure 3D). In the ring 
stage, we detected PfAlba4 uniquely in the nucleus. 
Since the nuclear membrane is difficult to visualize in 
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immuno-EM in the ring stage, co-labeling with antibodies 
directed against histone H3 was used to determine the area 
of the nucleus in this parasite form. The images localize 
PfAlba4 at the nuclear periphery. This pattern was con- 
firmed in developing merozoites (inset of Figure 3D) 
where we could detect distinct clusters of PfAlba4 at the 
periphery of the nuclei. This pattern was similar to the one 
previously observed for PfSir2 (17). Although western 
blots indicated that most of the PfAlba4 is in the cytoplas- 
mic fraction of schizonts (Figure 3C), our EM finding can 
be explained by a better accessibility of PfAlba4 in the 
nucleus to antibodies under the EM fixation conditions 
as compared to the cytoplasm. Similar experiments per- 
formed with anti-PfAlbal are depicted in Supplementary 
Figure S3C. In all sets of experiments, incubation with 
pre-immune sera did not give any signal. Our data dem- 
onstrate that the PfAlbas occupy distinct cellular compart- 
ments hinting at multiple distinct roles during the asexual 
blood stage. 

PfAlbas bind to RNA with differential specificities 

Their cytoplasmic location raises the question as to 
whether the PfAlbas play a role in the RNA biology of 
P. falciparum blood stages. To investigate this, we asked 
whether the PfAlbas could bind to RNA by performing 
EMSAs with recombinant GST-tagged PfAlbas and 
22-mer ssRNA oligonucleotides, polyA and polyU 
(Supplementary Table SI). Because of the AT-richness 
of the P. falciparum genome [~75% in exons (31)], these 
ssRNAs are the most representative of native transcripts. 
Parasite nuclear extracts and the ApiAP2 DNA-binding 
protein PfSip2 were used as controls. We observed that all 
four Albas could bind to polyA and polyU, albeit with 
different specificities (Figure 4A). At the protein concen- 
trations tested, PfAlbal showed a 2-fold stronger affinity 
for polyA as compared to polyU while PfAlba2 and 
PfAlba4 bound to both probes at similar levels. Finally, 
PfAlba3, which contains just the Alba-like domain, ex- 
hibited a high affinity for polyA and a lower affinity for 
polyU. These data suggest that the PfAlbas may bind to 
different RNA targets in vivo. Interestingly, for PfAlba2- 
4, multiple species of ssRNA-protein complexes were 
observed, hinting at protein oligomerization. 

We next wanted to assess if the ssRNA oligonucleotides 
polyA and polyU could compete for the binding of 
PfAlbas to TARE6 DNA. Continuing to focus on 
PfAlba4, we performed TARE6 EMSAs using a 50-fold 
excess of polyA and polyU as compared to the 
biotinylated TARE6 probe. PolyA abolished TARE6- 
PfAlba4 binding to the same extent as a 50-fold excess 
of unlabeled TARE6 whereas polyU had only a minor 
effect on the binding of PfAlba4 to TARE6 (Figure 4B). 
That RNA molecules compete for PfAlba4 DNA binding 
strongly suggests that PfAlbal utilizes the same domain to 
bind to DNA and RNA. This is an intriguing concept with 
potential regulatory implications in the nucleus. 

PfAlba3 forms an elongated dimer 

The PfAlba3-DNA and -RNA complexes migrate at dif- 
ferent sizes with an increase in protein concentration 
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Figure 4. PfAlbas bind to ssRNA in vitro. (A) To determine binding to 
the radiolabeled ssRNA probes polyA (left panel) and polyU (right 
panel), RNA EMSAs were performed in the absence (Probe alone) or 
presence of nuclear extracts from P. falciparum trophozoites (NE), 
1.5 ug GST, or two concentrations (0.3 and 1.2 ug) of the indicated 
GST-tagged PfAlba protein. A GST-tagged version of the 
DNA-binding domain of PfSip2 served as the negative control. 
RNP = Ribonucleoprotein complex. (B) The ability of the ssRNA 
probes to compete for PfAlba4-TARE6 binding was assessed by 
using 50-fold molar excess of the indicated unlabeled probes in 
TARE6 DNA EMSAs. Controls included reactions either lacking 
PfAlba4 (Probe alone) or containing 50-fold molar excess of unlabeled 
TARE6. Data were quantified as described in the legend of Figure 2C. 
Error bars represent the standard deviation of data obtained from three 
independent experiments. 



(Figures 2 A and 4A). Given that the archaeal Albas 
bind to DNA by forming dimers and dimer-dimer 
stacks (34), we wanted to determine the oligomeric 
status of recombinant PfAlba3. Because GST alone 
exists as a dimer (37), results obtained from GST 
PfAlba3 native gel electrophoresis and size exclusion chro- 
matography analyses were inconclusive (data not shown). 
Instead, we generated penta-His-tagged PfAlba3 by 
affinity purification, enriched it by size-exclusion chroma- 
tography (Figure 5 A) and subjected it to analytical ultra- 
centrifugation (Figure 5B). At a concentration of 8 uM, 
the data revealed the presence of a single species with a 
sedimentation coefficient of 1.3 ± 0.2 S and a frictional 
ratio of 2.1, indicating that His-PfAlba3 is an elongated 
dimer in solution. 



DISCUSSION 

Our work provides the first description of homologs of the 
archaeal histone-like protein family Alba as DNA and 
RNA-binding proteins in a eukaryotic system. After the 
identification of the gene encoding for the founding Alba 
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Figure 5. PfAlba3 exists as an elongated dimer in solution. (A) Amido 
black staining of purified penta-His-tagged PfAlba3 subjected to SDS- 
PAGE and transferred to a nitrocellulose membrane. (B) Analytical 
ultracentrifugation followed by sedimentation coefficient (S20,w) distri- 
bution analysis [c(s)] of His-PfAlba3. Data were collected under 
standard conditions (i.e. in water at 20°C) at a protein concentration 
of 8 uM. 



family member (38), SaclOb, homologs have been found in 
all archaeal species. Archaeal Alba homologs have been 
shown to bind to DNA without sequence specificity (35) 
and to RNA in vitro and in vivo (39). Our novel findings in 
P. falciparum highlight the evolutionary conservation as 
well as diversity of the eukaryotic Alba proteins. First, 
similar to their archaeal homologs, the PfAlbas bind to 
DNA and RNA (Figures 2 and 4). Second, the cellular 
distributions of the PfAlbas are developmentally regulated 
resulting in different subcellular localizations during the 
48-h blood stage cycle (Figures 3 and Supplementary 
Figure S3). Notably, the PfAlba4 nuclear staining during 
ring stages (i.e. foci in the nuclear periphery) partially 
follows the same pattern as the subtelomere-associated 
histone deacetylase PfSir2 (17). Given that the affinity of 
archaeal Albas for DNA is increased by Sir2-mediated 
deacetylation (40), by analogy, deacetylation of the 
PfAlbas by PfSir2 may increase their DNA-binding 
affinity within the restricted nuclear compartment and 
specifically enrich these proteins at subtelomeric regions 
of P. falciparum chromosomes. Third, in three of the four 
PfAlba members, the Alba domain has been fused either 
to an RGG-box domain or to an ENTH/VHS domain 
(Figure ID). This adds a potential regulatory element to 
the Alba domain that may recruit other molecules to the 
PfAlba DNA and/or RNA-binding sites. 

In spite of PfAlba3's absence in the TARE6-associated 
complex (Figure 1), our EMSA results show that all 
members of the PfAlba protein family have the capacity 
to bind to a TARE6 DNA probe in vitro (Figure 2A). 
Because PfAlba3 is composed of a single domain, the 
Alba domain (Figure ID), DNA recognition by all 
PfAlba family members is in all likelihood attributable 
to their Alba domains, with the other domains 
determining specificity. This is further supported by the 
molecular models of the tertiary structure of the Alba 



domains from PfAlba 1-4 constructed using the 
archaeal Alba SsolOb (PDB ID: lhOx) as a template 
(Supplementary Figure S4). These models show that the 
structures of the PfAlba 1-4 Alba domains are likely to be 
conserved including the extended P-hairpin that is 
proposed to interact with DNA in archaea (35). 
Interestingly, the lysine residue that is deacetylated by 
archaeal Sir2 is conserved in PfAlba2 and 4 
(Supplementary Figure S4). 

Archaeal Alba binds to DNA as a dimer that spans 
~15bp of dsDNA with the two P-hairpins formed by 
each Alba monomer presumably interacting with equiva- 
lent minor grooves (35). Sequence alignments of Alba 
domains from several organisms including P. falciparum 
have revealed that the residues mapping to the dimer inter- 
face are well-conserved implying a conserved dimeric qua- 
ternary structure (41). Because PfAlba3 is an elongated 
dimer in solution (Figure 5B), it is reasonable to assume 
that the mechanism by which the PfAlbas bind to DNA is 
similar to the one in archaea and requires protein dimer- 
ization. This is also in line with our results indicating that 
dsDNA fragments <15bp are poor competitors of 
TARE6-PfAlba4 binding (Figure 2D). Furthermore, it 
has been reported that the affinity of Alba proteins for 
DNA is increased by hetero-dimerization and dimer- 
dimer stacking interactions (34,42). Given that PfAlba2 
and PfAlba4 interacted reciprocally in a genome-wide 
P. falciparum yeast-two hybrid screen (43), PfAlba 
heterodimer formation possibly occurs in vivo influencing 
DNA/RNA affinity and/or sequence specificity. Finally, in 
archaea, Albas bind to DNA without sequence specificity 
(35). Genome-wide analysis of SsolOb distribution by 
chromatin immunoprecipitation demonstrated that the 
protein was uniformly and ubiquitously distributed on 
the chromosome of S. solfataricus (35). Nevertheless, a 
recent study showed that an Alba homolog MmmlOb in 
the mesophile Methanococcus maripaludis binds to an 
18-bp degenerate DNA motif with apparent sequence spe- 
cificity (44). This indicates that the DNA specificity and 
functions of Alba homologs have greatly diverged 
amongst different organisms. To begin to study this, 
we transfected wild-type 3D7 parasites with Tyl 
epitope-tagged versions of PfAlba 1 and PfAlba4 
(Supplementary Figure S5) and performed ChlP-seq 
analysis using anti-Tyl antibodies. However, we were 
unable to obtain enrichment of the PfAlbas at specific 
genomic loci under the preliminary conditions tested 
(data not shown). That the tagged proteins localized to 
the same cellular compartments as their endogenous 
PfAlba counterparts (Supplementary Figure S5D) sug- 
gested that they were functional and provides additional 
support for a perinuclear role for the PfAlbas in ring stage 
parasites. Overall, we hypothesize that the PfAlbas con- 
tribute to the recruitment of epigenetic factors to the 
PERC, although other functions related to internal 
chromosomal regions that localize to the nuclear periph- 
ery cannot be ruled out. 

As stated above, the differential cellular localization of 
the PfAlbas during distinct stages of the 48-h intra- 
erythrocytic parasite growth cycle may provide vital 
clues to the in vivo functions of these proteins. In rings. 



Nucleic Acids Research, 2012, Vol. 40, No. 7 3075 



PfAlbal, PfAlba2 and PfAlba4 localization is restricted to 
the nuclear periphery (Figures 3A and Supplementary 
Figure S3) and this is presumably required for chromatin 
regulation. Around 18 h post-invasion, i.e. in trophozo- 
ites, the majority of PfAlbal, PfAlba2 and PfAlba4 are 
found as cytoplasmic speckles (>50% based on western 
blot analysis in Figure 3C), which persists during schizog- 
ony. We therefore hypothesize that the PfAlbas perform 
RNA-dependent functions in the cytoplasm of blood 
stages. This is supported by our observation that the 
PfAlbas directly bind to ssRNA (Figure 4A). The only 
published example of Plasmodial Albas comes from the 
rodent malaria parasite P. berghei, where three Alba 
homologs co-purified with components of cytoplasmic 
P-granules from sexual stages, i.e. gametocytes (36). 
Although no direct PbAlba-RNA binding was shown, 
the authors proposed that the PbAlbas participate in the 
repression of certain stage-specific mRNA species. 
However, translational repression has not been 
demonstrated for P. falciparum asexual blood stages 
indicating that PfAlba-RNA interaction could modulate 
additional cellular processes in this parasite. TARE 1-6 se- 
quences do not exist in rodent malaria parasites and it will 
be interesting to explore if P. berghei Albas bind to DNA. 
Phylogenetic analysis categorized the Alba-like domains 
of the PfAlbas as belonging to the RPP20 (PfAlba3&4) 
and RPP25 (PfAlbal&2) RNase-P superfamilies (41), 
which are involved in the processing of t-RNAs and ribo- 
somal RNAs by forming heterodimers (45), and suggested 
that the plasmodial Albas may preferentially bind to 
RNA. Such an interaction with RNA could stabilize 
specific RNA molecules or enable RNA secondary struc- 
ture formation, in turn affecting PfAlba localization. An 
RNA 'chaperoning' role has previously been described for 
the Hfq protein in E. coli (46). Another possibility is that 
post-translational modifications such as acetylation, 
phospohorylation and/or methylation could modulate 
PfAlbas' affinity for DNA versus RNA in a stage-specific 
manner and trigger transport from the nucleus to the cyto- 
plasm. This is not unprecedented because deacetylation of 
SsolOb by Sir2 has been shown to increase SsolOb's 
DNA-binding affinity (40) and phosphorylation and 
methylation of arginine residues in the RGG domain of 
yeast Npl3p have been shown to alter its nuclear import 
by affecting interaction with MtrlOp (47). Finally, RNA 
binding may also be an important determinant of PfAlbas' 
nuclear function. Recently, non-coding RNA (ncRNA) 
transcripts corresponding to TARE6 and the adjacent 
var intronic regions have been identified in P. falciparum 
(48,49). Given that the PfAlbas bind to RNA in vitro 
(Figure 4), these proteins may provide a direct link 
between the previously described subtelomeric non-coding 
RNA and the regulation of antigenic variation. 

In conclusion, the characterization of the P. falciparum 
Albas reveals for the first time that Alba-like proteins are 
capable of DNA and RNA binding in a eukaryotic 
organism. Our data implicate a dual function for the 
PfAlbas in chromatin biology and in RNA regulation. 
Overall, this opens up new avenues to explore key 
aspects of not just the biology of chromosome ends such 
as the molecular mechanism of antigenic variation and the 



spatial organization of P. falciparum chromosomes into 
perinuclear foci, but also the biology of 
post-transcriptional gene regulation. Our inability to 
obtain PfAlba knockout parasites suggests that the 
PfAlbas are crucial for parasite survival. Inducible 
protein knockdown parasite lines for PfAlbas are now 
needed to unravel their physiological role during the 
various stages of the intra-erythrocytic cycle of 
P. falciparum. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR online: 
Supplementary figures S1-S5 and Supplementary tables 
S1-S4. 
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